46 research outputs found
Comparing Performance and Portability between CUDA and SYCL for Protein Database Search on NVIDIA, AMD, and Intel GPUs
The heterogeneous computing paradigm has led to the need for portable and
efficient programming solutions that can leverage the capabilities of various
hardware devices, such as NVIDIA, Intel, and AMD GPUs. This study evaluates the
portability and performance of the SYCL and CUDA languages for one fundamental
bioinformatics application (Smith-Waterman protein database search) across
different GPU architectures, considering single and multi-GPU configurations
from different vendors. The experimental work showed that, while both CUDA and
SYCL versions achieve similar performance on NVIDIA devices, the latter
demonstrated remarkable code portability to other GPU architectures, such as
AMD and Intel. Furthermore, the architectural efficiency rates achieved on
these devices were superior in 3 of the 4 cases tested. This brief study
highlights the potential of SYCL as a viable solution for achieving both
performance and portability in the heterogeneous computing ecosystem.Comment: This article was accepted for publication in 2023 IEEE 35th
International Symposium on Computer Architecture and High Performance
Computing (SBAC-PAD
Assessing Opportunities of SYCL and Intel oneAPI for Biological Sequence Alignment
Background and objectives. The computational biology area is growing up over
the years. The interest in researching and developing computational tools for
the acquisition, storage, organization, analysis, and visualization of
biological data generates the need to create new hardware architectures and new
software tools that allow processing big data in acceptable times. In this
sense, heterogeneous computing takes an important role in providing solutions
but at the same time generates new challenges for developers in relation to the
impossibility of porting source code between different architectures.
Methods. Intel has recently introduced oneAPI, a new unified programming
environment that allows code developed in the SYCL-based Data Parallel C++
(DPC++) language to be run on different devices such as CPUs, GPUs, and FPGAs,
among others. Due to the large amount of CUDA software in the field of
bioinformatics, this paper presents the migration process of the SW\# suite, a
biological sequence alignment tool developed in CUDA, to DPC++ through the
oneAPI compatibility tool dpc (recently renowned as SYCLomatic).
Results. SW\# has been completely migrated with a small programmer
intervention in terms of hand-coding. Moreover, it has been possible to port
the migrated code between different architectures (considering different target
platforms and vendors), with no noticeable performance degradation.
Conclusions. The SYCLomatic tool presented a great performance-portability
rate. SYCL and Intel oneAPI can offer attractive opportunities for the
Bioinformatics community, especially considering the vast existence of
CUDA-based legacy codes
Monitoring and preliminary analysis of the natural responses recorded in a poorly accessible streambed spring located at a fluviokarstic gorge in Southern Spain
The analysis of natural responses (hydrodynamic, hydrothermal and hydrochemical) of karst springs is a well-established approach to provide insights into the hydrogeological functioning of the aquifers that they drain. However, a suitable monitoring program of these responses are often difficult to launch in poorly accessible streambed springs, due to the mixing between surface water and groundwater, in addition to topographic impediments. This work describes the installation procedure of the measurement equipment and the preliminary hydrogeological dataset collected at the Charco del Moro spring (Southern Spain) during one year. This outlet emerges 5 m below water surface, at the bottom of a partially flooded 20 - 200 m deep and 2 km long gorge, eroded by the Guadiaro River streamflow. It is considered the largest discharge point in the region, draining groundwater from northern nearby carbonate outcrops, although its catchment area is not established yet. Continuous (hourly) monitoring of electrical conductivity, water temperature, turbidity and water level (discharge) reflects a high degree of heterogeneity in the duality of groundwater flow and storage dynamics, which is typical of karst conduit flow systemsUniversidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tec
Customized Nios II multi-cycle instructions to accelerate block-matching techniques
This study focuses on accelerating the optimization of motion estimation algorithms, which are widely used in video coding standards, by using both the paradigm based on Altera Custom Instructions as well as the efficient combination of SDRAM and On-Chip memory of Nios II processor. Firstly, a complete code profiling is carried out before the optimization in order to detect time leaking affecting the motion compensation algorithms. Then, a multi-cycle Custom Instruction which will be added to the specific embedded design is implemented. The approach deployed is based on optimizing SOC performance by using an efficient combination of On-Chip memory and SDRAM with regards to the reset vector, exception vector, stack, heap, read/write data (.rwdata), read only data (.rodata), and program text (.text) in the design. Furthermore, this approach aims to enhance the said algorithms by incorporating Custom Instructions in the Nios II ISA. Finally, the efficient combination of both methods is then developed to build the final embedded system. The present contribution thus facilitates motion coding for low-cost Soft-Core microprocessors, particularly the RISC architecture of Nios II implemented in FPGA. It enables us to construct an SOC which processes 50Ă—50 @ 180 fps
Smith-Waterman algorithm on heterogeneous systems: A case study
The well-known Smith-Waterman (SW) algorithm is a high-sensitivity method for local alignments. However, SW is expensive in terms of both execution time and memory usage, which makes it impractical in many applications. Some heuristics are possible but at the expense of losing sensitivity. Fortunately, previous research have shown that new computing platforms such as GPUs and FPGAs are able to accelerate SW and achieve impressive speedups. In this paper we have explored SW acceleration on a heterogeneous platform equipped with an Intel Xeon Phi coprocessor. Our evaluation, using the well-known Swiss-Prot database as a benchmark, has shown that a hybrid CPU-Phi heterogeneous system is able to achieve competitive performance (62.6 GCUPS), even with moderate low-level optimisations.Facultad de Informátic
State-of-the-art in Smith-Waterman Protein Database Search on HPC Platforms
Searching biological sequence database is a common and repeated task in bioinformatics and molecular biology. The Smith–Waterman algorithm is the most accurate method for this kind of search. Unfortunately, this algorithm is computationally demanding and the situation gets worse due to the exponential growth of biological data in the last years. For that reason, the scientific community has made great efforts to accelerate Smith–Waterman biological database searches in a wide variety of hardware platforms. We give a survey of the state-of-the-art in Smith–Waterman protein database search, focusing on four hardware architectures: central processing units, graphics processing units, field programmable gate arrays and Xeon Phi coprocessors. After briefly describing each hardware platform, we analyse temporal evolution, contributions, limitations and experimental work and the results of each implementation. Additionally, as energy efficiency is becoming more important every day, we also survey performance/power consumption works. Finally, we give our view on the future of Smith–Waterman protein searches considering next generations of hardware architectures and its upcoming technologies.Instituto de Investigación en InformáticaUniversidad Complutense de Madri
Evaluation of Intel's DPC++ Compatibility Tool in heterogeneous computing
The Intel DPC++ Compatibility Tool is a component of the Intel oneAPI Base Toolkit. This tool automatically transforms CUDA code into Data Parallel C++ (DPC++), thus assisting in the migration process. DPC++ is an implementation of the programming standard for heterogeneous computing known as SYCL, which unifies the development of parallel applications on CPUs, GPUs or even FPGAs.
This paper analyzes the DPC++ Compatibility Tool by considering the manual intervention required and the problems encountered while migrating the Rodinia benchmarks. For this suite, this tool achieves an impressive rate of almost 87% for code successfully migrated. Moreover, a comparative study of the performance obtained by the migrated code was carried out, showing a moderate overhead in most of the migrated examples. Finally, a performance comparison on different devices was also performed
Formation of stellar inner discs and rings in spiral galaxies through minor mergers
Recent observations show that inner disks and rings (IDs and IRs) are not preferentially found in barred galaxies, pointing to the relevance of formation mechanisms different to the traditional bar-origin scenario. Nevertheless, the role of minor mergers in the formation of these inner components (ICs), while often invoked, is still poorly understood. We have investigated the capability of minor mergers to trigger the formation of IDs and IRs in spiral galaxies through collisionless N-body simulations. Our models prove that minor mergers are an efficient mechanism to form rotationally-supported stellar ICs in spirals, neither requiring strong dissipation nor noticeable bars, and suggest that their role in the formation of ICs must have been much more complex than just bar triggering
Smith-Waterman algorithm on heterogeneous systems: A case study
The well-known Smith-Waterman (SW) algorithm is a high-sensitivity method for local alignments. However, SW is expensive in terms of both execution time and memory usage, which makes it impractical in many applications. Some heuristics are possible but at the expense of losing sensitivity. Fortunately, previous research have shown that new computing platforms such as GPUs and FPGAs are able to accelerate SW and achieve impressive speedups. In this paper we have explored SW acceleration on a heterogeneous platform equipped with an Intel Xeon Phi coprocessor. Our evaluation, using the well-known Swiss-Prot database as a benchmark, has shown that a hybrid CPU-Phi heterogeneous system is able to achieve competitive performance (62.6 GCUPS), even with moderate low-level optimisations.Facultad de Informátic